Risk factor analysis and nomogram establishment and verification of brain astrocytoma patients based on SEER database

Astrocytoma is a common brain tumor that can occur in any part of the central nervous system. This tumor is extremely harmful to patients, and there are no clear studies on the risk factors for astrocytoma of the brain. This study was conducted based on the SEER database to determine the risk factors affecting the survival of patients with astrocytoma of the brain. Patients diagnosed with brain astrocytoma in the SEER database from 2004 to 2015 were screened by inclusion exclusion criteria. Final screened brain astrocytoma patients were classified into low grade and high grade according to WHO classification. The risk factors affecting the survival of patients with low-grade and high-grade brain astrocytoma were analyzed by univariate Kaplan–Meier curves and log-rank tests, individually. Secondly, the data were randomly divided into training set and validation set according to the ratio of 7:3, and the training set data were analyzed by univariate and multivariate Cox regression, and the risk factors affecting the survival of patients were screened and nomogram was established to predict the survival rates of patients at 3 years and 5 years. The area under the ROC curve (AUC value), C-index, and Calibration curve are used to evaluate the sensitivity and calibration of the model. Univariate Kaplan–Meier survival curve and log-rank test showed that the risk factors affecting the prognosis of patients with low-grade astrocytoma included Age, Primary site, Tumor histological type, Grade, Tumor size, Extension, Surgery, Radiation, Chemotherapy and Tumor number; risk factors affecting the prognosis of patients with high-grade astrocytoma include Age, Primary site, Tumor histological type, Tumor size, Extension, Laterality, Surgery, Radiation, Chemotherapy and Tumor number. Through Cox regression, independent risk factors of patients with two grades were screened separately, and nomograms of risk factors for low-grade and high-grade astrocytoma were successfully established to predict the survival rate of patients at 3 and 5 years. The AUC values of low-grade astrocytoma training set patients were 0.829 and 0.801, and the C-index was 0.818 (95% CI 0.779, 0.857). The AUC values of patients in the validation set were 0.902, 0.829, and the C-index was 0.774 (95% CI 0.758, 0.790), respectively. The AUC values of high-grade astrocytoma training set patients were 0.814 and 0.806, the C-index was 0.774 (95% CI 0.758, 0.790), the AUC values of patients in the validation set were 0.802 and 0.823, and the C-index was 0.766 (95% CI 0.752, 0.780), respectively, and the calibration curves of the two levels of training set and validation set were well fitted. This study used data from the SEER database to identify risk factors affecting the survival prognosis of patients with brain astrocytoma, which can provide some guidance for clinicians.


Scientific Reports
| (2023) 13:7754 | https://doi.org/10.1038/s41598-023-33537-w www.nature.com/scientificreports/ on both morbidity and mortality in children 5 . At present, the traditional method of clinical treatment of brain tumors is surgery, radiotherapy and chemotherapy [6][7][8] . Astrocytoma, an aggressive tumor with the worst prognosis, can slightly improve survival with reasonable treatment, but the risk factors for this tumor have rarely been clearly studied 9 . This study used data from the U.S. National Public Database to analyze risk factors affecting the survival of patients with brain astrocytoma. The SEER database is currently the largest public cancer database, covering approximately 28% of the U.S. population, and the SEER database includes basic information about the U.S. population and information about relevant cancer characteristics 10 . In recent years, nomogram have been widely used in the prediction of various diseases, especially tumors. It meets the needs of integrated models and plays a very important role in the current "digital medicine" environment, using nomogram to facilitate prognosis predictions for clinicians [11][12][13] . Therefore, this study aims to use the data from the SEER database to screen for risk factors affecting the survival of patients with brain astrocytoma, and to establish nomogram model of the survival rate of patients at 3 and 5 years, so as to guide doctors in predicting the prognosis of patients and provide assistance to clinicians.

Materials and methods
Data source. The data for this study were selected from the SEER database established by the National Cancer Institute, and we selected the database containing 13 registries with radiotherapy data, which provided data that could support the completion of this study. A total of 6154 patients diagnosed with astrocytoma of the brain from 2004 to 2015 were extracted from the database, and a total of 2214 patients were screened according to the inclusion and exclusion criteria. The types of astrocytoma include diffuse astrocytoma, anaplastic astrocytoma, pilocytic astrocytoma, unique astrocytoma variants, and astrocytoma, NOS above five types. Exclusion criteria. (i) Baseline information (e.g., race) is unknown; (ii) tumor size and tumor number are missing; (iii) survival time is unknown; (iv) proven only at autopsy or death.
Statistical methods. The data extracted from the SEER database were first organized according to the inclusion and exclusion criteria using Excel and classified into low-grade and high-grade brain astrocytoma patients according to WHO classification. The survival rates were calculated by Kaplan-Meier curve method using R-studio 4.2.2 software for low-grade and high-grade brain astrocytoma patients, respectively. and the effect of the included factors on patient survival was demonstrated by K-M curves, and log-rank test was used for group comparisons of the same variables. The data of low-grade and high-grade astrocytoma were randomly divided into training set and validation set in a 7:3 ratio with R-studio software, and χ 2 tests were performed between different variables in the training and validation sets using SPSS. Univariate and multivariate Cox regression analyses were performed on the training set data using R-studio4.1.1, create a nomogram of the final filtered variables using the R packages 'rms' , 'foreign' , and 'survival' , and the area under the ROC curve (AUC value) and C-index were used to evaluate the accuracy of the model, with AUC and C-index taking values ranging from 0-1, the closer to 1 indicating the more accurate the model; the calibration curve was used to evaluate the calibration degree of the model, and the closer the calibration curve was to the standard curve indicating the stronger predictive ability of the model. The differences were considered statistically significant at P < 0.05, except for the univariate Cox regression at P < 0.1.

Results
Comparison of patient baseline features. In this study, a total of 2214 patients were included in the study, there were 539 patients with low-grade astrocytoma and 1675 patients with high-grade astrocytoma. R-studio 4.2.2 was randomly split into training set and validation set according to the ratio of 7:3, with 379 patients in the low-level training set, 160 patients in the validation set, 1175 patients in the high-level training set, and 500 patients in the validation set. Comparing the different variables in the training set and the validation set, the p-value of the χ 2 test result was > 0.05, and the difference was not statistically significant, indicating that the two groups were randomly assigned. Information on the two different grades and the results of the χ 2 tests are shown in Tables 1 and 2 were all related to poor survival time (Fig. 1).
Risk factor analysis affecting survival in patients with high-grade astrocytoma. By univariate Kaplan-Meier survival curve and log-rank test, Age (P < 0,0001), Primary site (P < 0.0001), Tumor histology type (P < 0.0001), Tumor size (P < 0.0001), Extension (P < 0.0001), Laterality (P = 0.01), Surgery (P < 0.0001), Radiation (P < 0.0001), Chemotherapy (P < 0.0001) and Tumor number (P < 0.0001) are risk factors for the prognosis of patients with high-grade astrocytoma. The results of the established K-M survival curves and log-rank tests showed that Age ≥ 80 years, Primary site at brainstem, Astrocytoma, NOS, Tumor size < 60 mm, deeper Extension, Bilateral, no Surgery, no Radiotherapy or Chemotherapy and Tumor number > 1 were all associated with poorer survival time in patients (Fig. 2).
Single-factor and multi-factor Cox regression results. Univariate and multivariate COX regression results for low-grade astrocytoma. Patient data from the low-grade astrocytoma training set (13 variables) were included in univariate Cox regression analysis, and the univariate Cox regression excluded the gender variable (P > 0.1). To avoid omitting important variables, 12 variables with P < 0.1 in the univariate Cox regression were included in the multivariate Cox regression. If P < 0.1 in the univariate Cox regression analysis, the factor was associated with prognostic survival of the patients; if P < 0.05 in the multivariate Cox regression analysis, the factor was an independent factor affecting the survival of the patients. The univariate Cox regression results of this study showed that age greater than 40 years, white ethnicity, histological type of tumor, primary site, lateral bilateral tumor, grade II, larger tumor size, deeper entry into the brain, surgery, radiotherapy, chemotherapy, and the number of tumors were factors related to the prognosis and survival of patients; multivariate Cox regression results showed that older age, bilateral tumors, and radiotherapy and chemotherapy were independent factors affecting patient survival (Table 3).
Univariate and multivariate COX regression results for high-grade astrocytoma. The results of high-grade astrocytoma univariate Cox regression showed that age greater than 60 years, diffuse astrocytoma, initial location, bilateral tumors, tumor size, deeper entry into the brain, surgery, radiotherapy, chemotherapy, and tumor number were factors related to the patient's prognosis and survival. Multivariate Cox regression results showed that older age, diffuse astrocytoma, initial location, bilateral tumor, tumor size, deeper brain penetration, surgery, radiotherapy, and chemotherapy were independent factors affecting patient survival ( Table 4).

Creation of nomogram.
The variables screened in the multifactorial Cox regression analysis (P < 0.05) were included in the R-studio software to create a nomogram model. Different values for each variable were taken to obtain different values of scores, and the total scores were obtained by adding all the scores of each variable, and according to the total scores, the survival rate of patients at 3 and 5 years could be predicted accordingly (Figs. 3, 4).

Validation of nomogram.
The area under the ROC curve and C-index were used to evaluate the discrimination of the model, and the calibration curve was used to evaluate the calibration of the model.  (Fig. 8).

Discussion
Under the current trend of "digital medicine", it is important for both doctors and patients to use a combination of clinical diagnosis and intelligent means to determine the patient's condition and prognosis related risk factors.
On the one hand, it can assist doctors to understand the patient's condition in time for more correct treatment; on the other hand, it is conducive to patients having a clearer understanding of their own conditions, which can greatly promote communication between doctors and patients. At the same time, in recent years, more and more scholars have conducted tumor research by mining SEER database, thus generating a variety of tumor prediction models, which may become a new direction for tumor research in the future 14  www.nature.com/scientificreports/ curve analysis showed that the factors we included had an impact on patient survival, regardless of whether the tumor was low-grade or high-grade brain astrocytoma, with the exception of age and gender. The results of univariate and multifactor cox regression analysis of the training set data for patients of both grades showed that no radiotherapy and chemotherapy were protective factors for patients with low-grade brain astrocytoma with an OR less than 1, whereas the opposite was true for high-grade. It could indicate that patients with certain tumors of low grade would have longer survival without radiotherapy treatment, while patients with high-grade astrocytoma would need radiotherapy to survive longer. This result is clinically consistent and has some clinical significance. Meanwhile, the COX regression results affecting patient survival were consistent with the K-M curve, indicating the accuracy of the results. Age has been found to be an important factor affecting the survival of patients in both low-grade and highgrade brain astrocytoma, and this result is more consistent with the findings of other scholars. Previous studies have also found a strong relationship between brain tumors and age 8 , older age predicts higher risk of disease 15,16 . However, some scholars studying advanced age and brain tumors have also found that elderly people may have slower tumor progression 17 , and low-grade and high-grade brain tumor log-rank test results and the Cox regression results indicated that older patients are more likely to have lower survival rates. In conclusion, age is an extremely important factor in the prognosis of patients with brain tumors and deserves further study. The gender distribution in this study was relatively balanced. In terms of racial distribution, Whites were overwhelmingly represented. In this study, the K-M curve and Cox regression results showed that the differences between sex and race were not statistically significant (P > 0.5). Studies have found that the incidence and mortality of brain tumors in both men and women have decreased year by year in recent years, but no significant differences have been found between sexes and races 18 .
The primary site of the patient's brain astrocytoma is also an important factor affecting survival. By comparing the K-M survival curves of low-grade brain astrocytoma with those of high-grade brain astrocytoma, we can find that the survival rate of patients with low-grade brain astrocytoma is significantly higher than that of high-grade brain astrocytoma, and this result is consistent with clinical reality. The data of this study has been analyzed to find that most of the tumors are concentrated in the cerebrum, and experts who have studied children's brain tumors have found that children's brain tumors, especially astrocytoma, are more common in the cerebellum 19 , which may be related to the wider distribution of age contained in the data of this study. Therefore, the age of the patient can affect the distribution of astrocytoma in the brain. We found that the survival rate of patients with pilocytic astrocytoma, a slow-growing benign tumor that generally does not require radiotherapy, is the highest among both low-grade and high-grade brain astrocytoma by K-M survival curves of brain tumor histology type. The results of cox regression showed that diffuse astrocytoma was a major risk factor for patient survival www.nature.com/scientificreports/ and astrocytoma has a poor prognosis 9,20 . At the same time, in this study, we found that the survival rate of patients with high-grade brain astrocytoma with bilateral tumors was lower than that of patients with unilateral tumors by K-M survival curves, and a greater number of tumors, deeper extension, and sequence number were associated with poorer patient survival. But this study found that the smaller the tumor, the lower the survival rate of patients, studies on breast cancer 21 , adult glioma 16 and peripheral schwannoma 22 have found that larger tumors are related to poor prognosis, the clinical inconsistency may be due to the fact that the classification of astrocytic tumors in this study is not the latest classification standard, and there are no molecular typing-related classification standards in the 2004-2015 database.. Current treatment for high-grade brain tumors or malignant brain tumors 23 , surgery on patients, and simultaneous radiotherapy and chemotherapy can benefit the survival of patients 7,14 . The results of this study yielded an OR greater than 1 for both low-grade and high-grade tumors in patients without surgery relative to patients with surgery, indicating that surgery has a better prognosis for patients, and this result is consistent with the current conventional treatment of brain tumors in clinical practice. The present study also has some limitations, as the SEER database itself provides a limited amount of information, and the database does not provide any information on genes, so we could not study the prognostic factors of brain tumors at the genetic level 19 . Second, with the development of gene sequencing, brain tumors have entered the era of molecular typing, the data extracted in this study before 2016, there was no molecular typing in the database, so molecular typing analysis could not be performed, and different histotypings would change the prognosis of patients, and it is worth further research in the future.
In conclusion, In this study, the risk factors for patients with low-grade and high-grade brain astrocytoma were screened by univariate Kaplan-Meier survival curves, respectively, while the risk factors affecting the prognosis of patients with brain astrocytoma in both grades were more completely included and the nomogram were successfully established, with high AUC and C-index in both tumor training and validation sets for both grades, and a good calibration curve fit, indicating that the nomogram has a strong predictive ability to predict  www.nature.com/scientificreports/ the 3-year and 5-year survival rates of patients. However, since the data were obtained from the United States, more studies are needed to verify whether the results obtained from the application of this data can be applied to the Chinese population, and the results obtained from this study can provide some reference for clinicians.

Data availability
The data that support the findings of this study are available from SEER database but restrictions apply to the availability of these data, which were used under license for the current study (ID: 12533-Nov2021), and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of SEER database.